Investigating the potential of Juglans regia phytoconstituents for the treatment of cervical cancer utilizing network biology and molecular docking approach

The fourth most frequent type of cancer in women and the leading cause of mortality for females worldwide is cervical cancer. Traditionally, medicinal plants have been utilized to treat various illnesses and ailments. The molecular docking method is used in the current study to look into the phytoconstituents of Juglans regia’s possible anticancer effects on cervical cancer target proteins. This work uses the microarray dataset analysis of GSE63678 from the NCBI Gene Expression Omnibus database to find differentially expressed genes. Furthermore, protein-protein interactions of differentially expressed genes were constructed using network biology techniques. The top five hub genes (IGF1, FGF2, ESR1, MYL9, and MYH11) are then determined by computing topological parameters with Cytohubba. In addition, molecular docking research was performed on Juglans regia phytocompounds that were extracted from the IMPPAT database versus hub genes that had been identified. Utilizing molecular dynamics, simulation confirmed that prioritized docked complexes with low binding energies were stable.


Introduction
Juglans regia also known as English or Persian walnut, is a member of the Juglandaceae family exhibiting therapeutic potential against coronary heart disease, rheumatoid arthritis, cancer, and diabetes [1].It is densely cultivated in Asia, Central Europe, and the United States.It is considered a reputable source of nutrients and phytochemicals that greatly benefit human health, such as polyphenols, proteins, fibers, sterols, and essential fatty acids [2].Several invitro studies validated the anticancer potential of Juglans regia with efficiency [3].Natural phytocompounds, due to their lesser side effects and cost efficiency, are considered potential leads for drug discovery.Cervical cancer is the fourth most common cause of cancer and the fourth most common cause of death among women [4].Cervical cancer is a female malignant tumor found in cervix cells located at the lower part of the uterus connecting to the vagina, caused

Dataset collection and pre-processing
The microarray dataset from the GEO database of GSE accession number GSE63678 related to cervical cancer, endometrial cancer, and vulvar cancer was mined, including samples into six categories (cervical cancer tissues/cells, normal cervical tissue/ cells, vulvar cancer tissues/cells, normal vulvar tissue/cells, endometrial cancer tissue/cells, normal endometrial tissue/ cells).The selection was based on the study type (profiling by array) and studies including all types of related gynecological cancer.Pre-processing initiates with data normalization, and the log was used to lower the values due to their wide range.Following sample clustering, Principal Component analysis was performed.Principal component analysis explains how genes are related or not related to one another via 2D graph representation.While clustering, a correlation matrix was generated with coefficients ranging between 0 and 1.It is an excellent way to predict gene functions because genes that share a biological process are frequently co-regulated utilizing heatmap [12].The LIMMA package in R enables gene expression differential analysis.The package supports twocolor spotted array preprocessing.The DEGs are screened in LIMMA by building a linear model and estimating with the Bayes-T test [13].The p-values and differential expression statistics were calculated using empirical Bayes.To properly visualize the results of the DE analysis, a volcano plot was generated, highlighting only the top 20 genes based on their p-values < 0.05 and logFC values >0.5.R codes and related generated figures are available in S1 File.

Protein-protein interaction
Using the STRING database and a high confidence score of 0.700 as the cutoff condition, a PPI network was built to determine the degree of similarity between genes at default parameters.STRING's network and enrichment facilities thoroughly characterize user gene lists and functional genomics datasets and create and share highly customized and augmented protein-protein association networks [14].Each interaction depicts two non-identical proteins produced by a different protein-coding gene locus.Cytoscape is a freely available visualization platform that enables the computation of network topological parameters and hub gene identification [15].The Molecular COmplex DEtection (MCODE) algorithm (version 1.5.1) is utilized (maximum depth = 100, node score = 0.2, and K-core = 220) for the computation of subnetworks and gene-enriched modules within the primary network [16].

Hub genes identification
Cytohubba, a Cytoscape plugin, was used to compute the MCC score for each node in the network [17].This study designated the genes with the highest MCC values as hub genes.As a result, these plugins aided in the identification of closely related genes.StringApp, another cytoscape app, was used to import the network directly from the STRING database online.

Identification of potential chemical leads
The Indian Medicinal Plants, Phytochemistry And Therapeutics (IMPPAT) database [18], the most significant resource on phytochemicals of Indian herbs, was utilized to iterate the potential phytochemical extracts from the bark, flower, fruit, and leaf, seeds, root and stem of Juglans regia.A ligand library of 1004 phytochemicals from the IMPATT database and literature studies was generated to identify potential chemical leads having the potential to inhibit the prioritized hub genes.Deletion of phytocompounds was performed manually and resulted in 210 screened compounds.Lipinski's rule of five (RO5) [19], a widely used drug-likeness measure, was used to screen the 210 phytochemical ligand library for potential druglike molecules.One thousand-four phytochemicals passed the R05 drug-likeness filter.The 3D structures of the druglike phytochemicals were then energy-minimized using the OpenBabel toolbox's obminimize [20].Finally, OpenBabel was used to transform the ligands' energy-minimized 3D structures.sdfto.pdb format.

ADME/T selection
The ADME/T (absorption, distribution, metabolism, excretion, and toxicity) qualities play a significant part in drug filtering when drug-likeness is determined by assessing current drug candidates' physiochemical attributes and structural aspects.During drug designing, it is vital to predict the situation and movement of a drug in the human body [21].Medicinal development costs can be decreased, and the process' overall success rate can be increased by predicting the ADME/T characteristics of drug compounds before drug design.Drugs are transported into the circulatory system by absorption.The medicine is distributed when it crosses the cell membrane barrier and enters numerous tissues, organs, or bodily fluids [22].The initial (parent) substance undergoes metabolism and is changed into new chemicals known as metabolites.Redox enzymes, often known as cytochrome P450 enzymes, process the majority of small-molecule drug metabolism in the liver [23].Excretion is when a drug's primary form and associated metabolites are expelled from the body.The drug's toxicity also impacts the human body.Using admetSAR 2.0, screened compounds' ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profiles were created [24].This web-based application identifies the ADMET qualities of screened input compounds with structural and functional similarity.It contains manually curated data on known chemical compounds [19].

Molecular docking
The molecular docking of energy-minimized 3D ligand and target protein was carried out using AutoDock version 4.2 [25].The respective Python script prepared for ligands and protein structures from AutoDockTools was used to transform their 3D structures.pdb format to.pdbqt format.Crystallographic structures provide atomic-level information about the protein's three-dimensional conformation.Thanks to this high-resolution information, the docking simulation will precisely depict the spatial arrangement of amino acids and other residues within the binding site.Blind docking is performed to identify the best possible docking pose [21,26].By assessing the critical residues in the target proteins, such as the catalytic residues and substrate binding residues, which are crucial for the function and specificity of the considered proteases, the appropriate grid box specified by the search space center and search space size was manually determined for identified hub genes at the individual level [27,28].With the active site in the middle of the grid box, the docking simulations were performed using the Lamarckian Genetic algorithm (LGA) [29,30].Each ligand's binding energy conformation was calculated using the default settings of 2,500,000 energy assessments, 27,000 generations, 150 for population size, 0.02 for gene mutation rate, and 0.8 for crossover rate of 10 runs.Visualization was performed by using PyMOL [31].

Molecular dynamics simulation
MD simulations for the potential inhibitors of hub genes were performed to evaluate the stability of their protein-ligand complexes using GROMACS 5.1.5and the GROMOS96 54a7 force field [32,33].The topology for the top inhibitors was generated using the Automated Topology Builder (ATB) version 3.0 (atb.uq.edu.au).After being positioned in the middle of a cubic periodic box, the protein-ligand combination was solvated by adding simple point charge (SPC) fluids.The system's net charge was then balanced by adding Na+ and Cl-ions.Utilizing the steepest descent algorithm, energy was reduced.The system was then heated to 310 K during a 500 ps with 2 fs endless number of particle, volume, and temperature (NVT) simulation [34].The pressure was then increased to 1 bar during a 500 ps, 2 fs, constant number of particle, pressure, and temperature (NPT) simulation.Protein and ligand were both position-restricted in the simulations mentioned above.The position constraint was then released, and a production MD run was run for 100 ns with a 2 fs time step.The structural coordinates were saved every 2 ps during the MD simulation.The v-rescale temperature was maintained at 310 K. Using the Parrinello-Rahman pressure coupling approach, and the pressure was kept at 1 bar [35].The temperature and pressure coupling's time constants were maintained at 0.1 ps and 2 ps, respectively.The long-range electrostatic interactions were calculated using the particle mesh Ewald (PME) method with fourth-order cubic interpolation and 0.16 nm grid spacing.
In contrast, the short-range interactions were computed for the atom pairs within the cutoff of 1.4 nm.The LINCS approach restricted all bonds [36].Using GROMACS scripts, the trajectories derived from the MD simulations were then utilized to compute and assess the protein's radius of gyration (Rg), root mean square deviation (RMSD), and root mean square fluctuation (RMSF) of the protein backbone Cα-atoms.

Identification of DEGs
The dataset from the GEO-NCBI database was analyzed using the R programming language to identify DEGs.LIMMA, ggplot2, GEOquery, and heat map were the R packages used for dataset analysis and visualization.The selection was based on the study type (profiling by array) and studies including all types of related gynecological cancer.The LIMMA package was utilized to analyze gene expression microarray datasets and a linear model to analyze bio-assays.Further, internal normalization, calculating the geometric mean for each gene, was performed.The Geoquery package assisted in retrieving data directly from the GEO-NCBI database.The heatmap library aids in the visualization of gene expression and clustering.The table enlists pvalues and logFC values for all differentially expressed genes.Gene IDs with logFC values greater than 0.5 were considered upregulated genes, resulting in 207 in this study (S1 Table ).

PPI network construction
Following the discovery of the upregulated genes, the upregulated genes were subjected to a STRING database, resulting in a network displaying multiple gene connections.The network contains proteins with the highest degrees, essential for finding hub genes and interactions between different proteins.The StringApp plugin in Cytoscape was utilized for the network interaction visualization (Fig 2).

Identification of hub-genes
The generated PPI network had 207 nodes, 253 edges, 3.2 average node degree, 0.

Screening of potential phytochemicals
Lipinski's rule of five (RO5) was used to compute the physicochemical parameters of the phytocompounds for five drug targets to assess their druglike characteristics.The molecular weight must be less than or equal to 500 g/mol, the number of hydrogen bond donors must be no more than 5, the number of hydrogen bond acceptors must be no more than 10, and the log p-value must be no more than five.One rule violation in a lead candidate is permissible.(S2 Table ) displays the top hit phytochemicals and the reference compound's anticipated druglike characteristics-all disclosed ligands displayed good druglike properties.Different pharmacokinetic features were predicted using admetSAR (The absorption, distribution, metabolism, elimination (ADME), and toxicity of the top medication candidate molecules can be predicted using pharmacokinetic variables.(S3 Table) displays both targets' ADMET traits of the generated phytochemicals.Due to their poor pharmacokinetic qualities and toxicity, many medications must incorporate this method in their drug development.Early drug discovery relies on high-performance and quick ADMET profiling assays to identify active lead compounds [37].According to the ADMET profiling, all candidate compounds had no adverse effects upon absorption.The associated ADMET and physiochemical properties (S4 Table ) of potential compounds for different models, such as P-glycoprotein substrates, BBB penetration, and gastrointestinal absorption, showed positive results that strongly support the compounds' suitability as drug candidates.

Molecular docking
The intermolecular interactions among proteins and ligands were analyzed for computing binding energies of protein-ligand complexes using AutoDock v 4.2.The structure-based virtual screening prioritized FGF2-myrcene, IGF1-Juglone, and FGF2-Juglanin as prioritized docked complexes.PubChem database was mined for the retrieval of chemical structures of the phytocompounds.The binding interactions of drug targets with their respective prioritized  1. Table 2 summarizes the distances and residues involved in the interaction between the target proteins and the significant ligands.Contacting residues in each docked complex are also indicated.

Discussion
In the entire world, cervical cancer affects women more frequently than any other cancer.Primary prevention and screening are the best strategies for reducing the cost of care and mortality linked to cervical cancer.Commercially available drugs were experimentally validated for the disease treatment but pose serious ill-effects.Several researchers performed repurposing of drugs, but optimal drug candidates were not prioritized for treating cervical cancer.Therefore, phytoconstituents from herbs were considered for the drug discovery targeting cervical, vulvar, and endometrial cancers.These compounds possess enormous structural and chemical diversity with fewer side effects.By offering drug discovery a systems-level perspective and enabling the identification of potential and more effective therapeutic interventions, the combination of network biology with molecular docking improves drug discovery's precision, depth, and efficiency.In contrast to molecular docking, a computer technique used to forecast the binding interactions between a small molecule (drug candidate) and a protein target, network biology studies biological systems as interconnected networks of molecules.Network biology reveals potential off-target effects or other pathways that may be impacted by drug binding and aids researchers in comprehending the larger context in which a target protein functions.
Although numerous studies have used microarray-based technology to find molecular markers in cervical, endometrial, and vulvar cancers, there have been differing results reports because of patient selection, tissue source, and study designs.Therefore, the present study attempts to identify the prioritized gene signatures underlying cervical, endometrial, and vulvar cancers by a comprehensive meta-analysis of the microarray dataset GSE63678.It identifies 412 differentially expressed genes (207 upregulated and 205 downregulated genes).By uncovering upregulated genes, this study highlights the potential diagnostic and prognostic biomarkers in female reproductive-related cancers and assists in understanding the molecular mechanism of their development and progression.Protein-protein interactions (PPIs) offer various biological processes, cell-to-cell interactions, and metabolic and developmental control [38].The elucidation of protein interaction networks also contributes significantly to analyzing network robustness and stability [39].Topological parameters computation of the network enables the identification of hub genes.It offers the advantage of extracting information from large volumes of high-dimensional data, which assists in identifying novel players involved in multiple proteomic interactions in cervical, endometrial, and vulvar cancer patients.This renders the top five differentially expressed hub genes (Fibroblast Growth Factor 2 (FGF2), Insulin-like Growth Factor 1(IGF1), Estrogen Receptor 1 (ESR1), Myosin Light Chain 9(MYL9), and Myosin Heavy Chain 11 (MYH11).Fibroblast growth factor 2 protein encoded by the FGF2 gene involves various biological processes like tissue repair, embryonic development, tumor growth, and cell survival [40].It is experimentally validated as a potential target for treating gastric ulcers and myelofibrosis [41].Single-chain polypeptides with a high degree of sequence homology to pro-insulin are insulin-like growth factors 1 and 2 [42].It plays a crucial role in the growth and development of many tissues and in regulating overall growth and metabolism.It is an attractive target for treating cervical cancer, diabetes, and inflammation [43].Estrogen Receptor 1 (ESR1) is an attractive drug target for treating breast cancer, myocardial infarction, and migraine [44,45].Myosin light chain 9 (MYL9) and myosin heavy chain (MYH11) play a vital role in immune infiltration, tumor invasion, and metastasis and, therefore, serve as potential targets for cancer treatment [46].
Further, the chemical compounds of Juglans regia, mined from the IMPPAT database, were subjected to ADMET analysis.Screened non-toxic compounds were docked against identified plausible hub genes to identify selective inhibitors with optimal binding energy.This results in identifying phytochemicals from Juglans regia, which will serve as potential leads for drug discovery.Molecular docking analysis prioritizes myrcene, juglone, and juglans as attractive potential chemical leads capable of inhibiting Insulin growth factor 1 and Fibroblast Growth Factor 2. Myrcene, a yellow-colored oily liquid with a flash point of less than 200˚F is insoluble in water [47].It is an octa-1,6-diene monoterpene with methyl and methylene substituents at positions 3 and 7, respectively, with anti-inflammatory properties [48], anti-aging and analgesic properties [49].Myrcene, extracted from the fruit of Juglans regia, is distributed in adipose tissues, liver, kidney, and brain [50] with bioavailability of 30min in human plasma [51].Several studies have documented the antibacterial effects of myrcene against gram-positive bacteria [52], for example, Staphylococcus aureus [53], Escherichia coli [54], Salmonella enterica [55], etc. Myrcene can change the lipid monolayers' fluidity, stability, and morphology [48].Myrcene exhibits antitumor activity against lung cancer cells by inducing oxidative stress and apoptosis [56].In the FGF2-myrcene complex, a binding energy of -9.17 kcal/mol was computed.ILE139, PHE154, ALA199, and GLY203 interact with methylene substituents at position 7. PRO155, ARG202, and GLU201 form hydrogen bonds with C3 and C5 positions.
Juglone (5-hydroxy-1,4-naphtoquinone), an oxygen derivative of naphthalene, is extracted from the leaf of Juglans regia.It is transformed into a toxic compound when exposed to soil or air.Therefore, an appropriate dosage (13.1-1556.0mg/100 g) is required for drug design.Juglone exhibited anticancer effects on the different breast [57], lung [58], prostate [59], cervix and blood cancer models [60].Juglone induced early DNA single-strand damage on human fibroblasts, translating into apoptosis and necrosis [61].Juglone inhibited metastasis development in the same cell line and decreased spheroid invasiveness (C6) [62].Juglone blocks several molecular pathways involved in cancer development, such as the PIK3/Akt cascade mechanism [63].IGF2-Juglone docked complex exhibits a binding energy of -9.02 kcal/mol.CYS109, LEU112, and GLU106 residues of IGF2 interact with the hydroxyl group at the C5 position.TYR79 and PRO111 residues are bonded with the C4 and C5 positions of juglone via hydrogen bond.The presence of van der Waals interactions with TYR79 and ALA110 with the C4 benzene ring group are essential structural requirements for anti-toxicity, also found during the analysis.
Juglanin is a polyphenol extracted from the pericarp of Juglans regia, exhibiting anticancer and anti-inflammatory properties.Apoptosis and autophagy were triggered simultaneously and exerted a synergistic activity when treated with Juglanin [64].Juglanin enhanced the effect of doxorubicin, one of the most commonly used antitumor drugs (dox).It significantly increased the cytotoxic effect of doxo in normal and doxo-resistant A549 cells and normal and cisplatin-resistant H69 cells [65].MAPKs, the proteins that regulate cellular proliferation, differentiation, and apoptosis, are considered an attractive target of Juglanin [66].Docking analysis revealed that MET134, ALA136, SER138, GLY203, and PHE237 play a significant part in the binding of the C-7 position of the hydroxyl group in the benzene ring.Primary H-bonding interaction with the benzene ring (C-5th position of-OH group) could associate with the (Oatom) key chain of GLU238, MET134.
Molecular dynamics simulations were used to examine the stability of protein-ligand complexes further.The protein target's backbone structural framework generated root mean square deviation (RMSD) graphs for time at 100 ns.An average RMSD of 0.511 ± 0.21 nm was computed.RMSD values gained until 4 ns in the case of FGF2-Juglanin Most binding sites are thought to be shaped like alpha-helix, according to the Dictionary of Secondary Structure of Proteins (DSSP), and active sites are thought to be located in the coil region.To understand the flexibility of each residue, protein-ligand docked complexes were studied using residue-based root mean square fluctuations (RMSF).Due to the lack of structural data for the target protein, RMSF values for the initial 15 residues of each docked complex vary considerably (0.42-2.87 nm for each case).Less turbulence at the binding and active site implies that the binding cavity is rigid and intact.The docked complexes' solidity and structural changes are calculated by gmx gyrate's Rg values.It also determines the atomic mass corresponding to the mass centers of the complexes.With no variations after 50000 ps, the average Rg values of FGF2-Myrcene, IGF2-Juglone, and FGF2-Juglanin were (2.36-2.66nm), (2.29-2.89nm), and (2.44-2.56nm), respectively.Furthermore, the firmness of the prioritized protein-ligand complexes is confirmed by the correlation between Rg values and RMSD values of backbone C atoms.The inhibiting potential of Juglans regia towards the treatment of cervical cancer is explored first in this study, keeping in mind the validated anti-cancer properties of Juglans regia.The inclusion of the network biology approach includes the identification of optimal systematic interactions incurring cervical cancer.

Limitations of the study
Integrative bioinformatics involves including computationally intensive methods for discovering and prioritizing novel chemical leads having inhibitory potential against cancer target genes.It mainly depends on the simulated system's size, which ranges from nanoseconds to microseconds.The limited accuracy of the models and the diverse nature of chemical compound libraries result in the prediction of off-targets and multiple targets to mitigate disease progression.Moreover, molecular docking depends on the single receptor conformation adopted by the ligand of interest, which, as a result, limits the potential for exploration of novel specific chemical leads.Therefore, the current study suggests that in-vitro validation inhibits the potential of screened compounds for drug discovery.

Conclusion
Medicinal plants are potential precursors for drug discovery rather than synthetic compounds due to the chances of lesser side effects.Juglans regia is a traditional medicinal herb exhibiting antidiabetic, anticancer, and anti-inflammatory properties.Therefore, this study aims to identify potential phytocompounds with plausible inhibitory potential against cervical cancerrelated drug targets.We identified specific critical genes utilizing protein-protein network analysis and topological parameter computations.This renders potential five genes subjected to molecular docking analysis and MD simulation studies with prioritized phytocompounds.Such compounds will be utilized for drug discovery against cervical cancer.Therefore, it can improve the survival rate and reduce the death rate, which concentrates on suppressing or controlling this gene's function.

Fig 1 .
Fig 1. Representation of the complete workflow.https://doi.org/10.1371/journal.pone.0287864.g001 37 average local clustering coefficient, and 0.0903 PPI enrichment p-value.The created PPI network in this study contains much more acceptable interactions, according to the STRING database's reference value (PPI enrichment p-value = 1.0e-16) (Fig 3).The interaction score of Cyto-Hubba validated the top five hub genes utilizing the MCC algorithm.The algorithm identifies cliques by computing their respective centrality scores within the biological network.Clique refers to the subset of nodes in a network where each node is connected to every other node in the subset.It comprises of four procedural steps: (1) Identification of cliques, (2) Computing the centrality of each participating node based on the connectivity of its nodes, (3) Score-based ranking of the identified cliques, and (4) Identification of hub genes as cliques with higher centrality scores are considered essential for network's structure and function [17].The Cyto-Hubba outcomes of hub genes: Fibroblast Growth Factor 2(FGF2), Insulin-like Growth Factor 1(IGF1), Estrogen Receptor 1 (ESR1), Myosin Light Chain 9(MYL9), and Myosin Heavy Chain 11 (MYH11).